Space Efficient Quantile Summary for Constrained Sliding Windows on a Data Stream

نویسندگان

  • Jian Xu
  • Xuemin Lin
  • Xiaofang Zhou
چکیده

In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an -approximate summary. It uses O( 1 2 log N) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast and Space-Efficient Computation of Equi-Depth Histograms for Data Streams

Equi-depth histograms represent a fundamental synopsis widely used in both database and data stream applications, as they provide the cornerstone of many techniques such as query optimization, approximate query answering, distribution fitting, and parallel database partitioning. Equi-depth histograms try to partition a sequence of data in a way that every part has the same number of data items....

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Efficient Approximation of Correlated Sums on Data Streams

In many applications such as IP network management, data arrives in streams, and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries formed by composing basic aggregates on (x, y) pairs, and are of the form SUM{g(y) : x ≤ f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified fun...

متن کامل

Lectures on Streaming Algorithms

• Standard stream model: m elements from universe of size n, come one by one. Goal: compute a function of stream. Constraints: (1) Limited space (working memory), sublinear in n and m. (2) Access data sequentially. (3) Process each element quickly. • Graph stream: elements of the stream are edges. Let n be the number of vertices. We have m = O(n 2). To be able to handle interesting functions, w...

متن کامل

Research on Sliding Window Join Semantics and Join Algorithm in Heterogeneous Data Streams

Sliding windows of data stream have rich semantics, which results all kinds of window semantics of different data stream, so join semantics between the different types of windows becomes very complicated. The basic join semantic of data streams, the join semantic of tuple-based sliding window and the join semantic of time-based sliding window have partly solved the semantics of stream joins, bu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004